Enterprise Database Systems
Statistics for Data Science #2
Data Science Statisitcs: Applied Inferential Statistics
Data Science Statistics: Using Python to Compute & Visualize Statistics

Data Science Statistics: Applied Inferential Statistics

Course Number:
it_dssds2dj_02_enus
Lesson Objectives

Data Science Statistics: Applied Inferential Statistics

  • Course Overview
  • test a hypothesis about a sample by comparing it to the general population using the one-sample t-test available in the SciPy library
  • compare a sample with another independent sample using the independent t-test and with a related sample using a paired t-test using the SciPy library
  • apply independent t-tests on a real dataset to test a hypothesis that managers at a firm have higher salaries than non-managerial employees
  • work with Pandas and Matplotlib to analyze the stock price of Volkswagen in 2008, which were affected by some extreme events
  • compute the skewness and kurtosis of the returns on Volkswagen stock in 2008 and recognize how it was a few days of extreme behavior which increased those numbers
  • perform pre-processing operations on a dataset containing close prices for stocks and indices to analyze it using linear regression
  • use the scikit-learn library to fit a linear regression model on the returns on a stock and the returns on the S&P 500 index
  • use two explanatory variables - the returns on the S&P 500 index and on an index tracking the strength of the US Dollar - to perform a regression on the returns on individual stocks
  • recall different types of T-tests and identify the values they return, calculate percentage returns from time series data using Pandas, and measure the skew and kurtosis values for a series

Overview/Description

Explore how different t-tests can be performed by using the SciPy library for hypothesis testing in this 10-video course, which continues your explorations of data science. This beginner-level course assumes prior experience with Python programming, along with an understanding of such terms as skewness and kurtosis and concepts from inferential statistics, such as t-tests and regression. Begin by learning how to perform three different t-tests—the one-sample t-test, the independent or two-sample t-test, and the paired t-test—on various samples of data using the SciPy library. Next, learners explore how to interpret results to accept or reject a hypothesis. The course covers, as an example, how to fit a regression model on the returns on an individual stock, and on the S&P 500 Index, by using the scikit-learn library. Finally, watch demonstrations of measuring skewness and kurtosis in a data set. The closing exercise asks you to list three different types of t-tests, identify values which are returned by t-tests, and write code to calculate the percentage returns from time series data using Pandas.



Target

Prerequisites: none

Data Science Statistics: Using Python to Compute & Visualize Statistics

Course Number:
it_dssds2dj_01_enus
Lesson Objectives

Data Science Statistics: Using Python to Compute & Visualize Statistics

  • Course Overview
  • create and configure simple graphs with lines and markers using the Matplotlib data visualization library
  • use the NumPy library to manipulate arrays and the Pandas library to load and analyze a dataset
  • generate histograms and pie charts to analyze distributions and create scatter plots to plot the relationship between two variables in a dataset
  • apply Python native functions such as max() and sum() to summarize distributions and visualize these values using Matplotlib
  • use NumPy to compute statistics such as the mean and median on your data
  • calculate statistics such as the mode and standard error of mean using the SciPy library and compute more statistics such as variance and values at various percentiles using NumPy
  • use NumPy to compute the correlation and covariance of two distributions and visualize their relationship with scatterplots
  • standardize a distribution to express its values as z-scores and use Pandas to generate a correlation and covariance matrix for your dataset
  • create and configure a graph using Matplotlib, enumerate the details conveyed in a Boxplot, compute statistical values using the NumPy function, and compute the correlations between all pairs of columns in a Pandas dataframe

Overview/Description

Learners continue their exploration of data science in this 10-video course, which deals with using NumPy, Pandas, and SciPy libraries to perform various statistical summary operations on real data sets. This beginner-level course assumes some prior experience with Python programming and an understanding of basic statistical concepts such as mean, standard deviation, and correlation. The course opens by exploring different ways to visualize data by using the Matplotlib library, including univariate and bivariate distributions. Next, you will move to computing descriptor statistics for distributions, such as variance and standard error, by using the NumPy, Pandas, and SciPy libraries. Learn about the concept of the z-score, in which every value in a distribution is expressed in terms of the number of standard deviations from the mean value. Then cover the computation of the z-score for a series using SciPy. In the closing exercise, you will make use of the matplotlib data visualization library through three points represented by given coordinates, then enumerate all of the details which are conveyed in a Boxplot.



Target

Prerequisites: none

Close Chat Live